Using Multiset Discrimination to Solve Language Processing Problems Without Hashing
نویسندگان
چکیده
It is generally assumed that hashing is essential to solve many language processing problems efficiently; e.g., symbol table formation and maintenance, grammar manipulation, basic block optimization, and global optimization. This paper questions this assumption, and initiates development of an efficient alternative compiler methodology without hashing or sorting. The methodology rests on efficient solutions to the basic problem of detecting duplicate values in a multiset, which we call multiset discrimination. Paige and Tarjan [22] gav e an efficient solution to multiset discrimination for detecting duplicate elements occurring in a multiset of varying length strings. The technique was used to develop an improved algorithm for lexicographic sorting, whose importance stems largely from its use in solving a variety of isomorphism problems [2]. The current paper and a related paper [23] show that full lexicographic sorting is not needed to solve these isomorphism problems, because they can be solved more efficiently using straightforward extensions to the simpler multiset discrimination technique. By reformulating language processing problems in terms of multiset discrimination, we also show how almost ev ery subtask of compilation can be solved without hashing in worst case running time no worse (and frequently better) than the best previous expected time solution (under the assumption that one hash operation takes unit expected time). Because of their simplicity, our solutions may be of practical as well as theoretical interest. The various applications presented culminate with a new algorithm to solve iterated strength reduction folded with useless code elimination that runs in worst case asymptotic time and auxiliary space Θ(|L| + |L * |), where |L| and |L * | represent the lengths of the initial and optimized programs respectively. The previous best solution due to Cocke and Kennedy takes Ω(|L||L * |) hash operations 1. A preliminary version of this paper appeared in the Conference Record of the Eighteenth Annual ACM Symposium on Principles of Programming Languages[6]. Part of this work was done while both authors were visiting the University of Wisconsin at Madison, and while R. Paige was visiting DIKU at the University of Copenhagen. 2. The research of this author was partially supported by National Science Foundation Grant No. CCR-9002428 and by Air Force Office of Scientific Research Grant No. AFOSR-91-0308. 3. The research of this author was partially supported by Office of Naval Research Grant No. N00014-93-1-1036 and by Air Force Office of Scientific Research Grant No. AFOSR-91-0308.
منابع مشابه
Multiset Discrimination − a Method for Implementing Programming Language Systems Without Hashing
It is generally assumed that hashing is essential to many algorithms related to efficient compilation; e.g., symbol table formation and maintenance, grammar manipulation, basic block optimization, and global optimization. This paper questions this assumption, and initiates development of an efficient alternative compiler methodology without hashing or sorting. Underlying this methodology are se...
متن کاملCompressed Image Hashing using Minimum Magnitude CSLBP
Image hashing allows compression, enhancement or other signal processing operations on digital images which are usually acceptable manipulations. Whereas, cryptographic hash functions are very sensitive to even single bit changes in image. Image hashing is a sum of important quality features in quantized form. In this paper, we proposed a novel image hashing algorithm for authentication which i...
متن کاملEfficient Translation of External Input in a Dynamically Typed Language
New algorithms are given to compile external data in string form into data structures for high level datatypes. Let I be a language of external constants formed from atomic constants and from set, multiset, and tuple constructors. We show how to read an input string C, decide whether it belongs to I, convert it to internal form, and build initial data structures storing the internal value of C ...
متن کاملExact Algorithms for Set Multicover and Multiset Multicover Problems
Given a universe N containing n elements and a collection of multisets or sets over N , the multiset multicover (MSMC) or the set multicover (SMC) problem is to cover all elements at least a number of times as specified in their coverage requirements with the minimum number of multisets or sets. In this paper, we give various exact algorithms for these two problems, with or without constraints ...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 145 شماره
صفحات -
تاریخ انتشار 1995